README: Replication Package for "Preventing School Dropout at Scale: Experimental Evidence from Guatemala"

Authors  
Melissa Adelman (World Bank)  
Francisco Haimovich (World Bank)  
Mauricio Romero (ITAM; J-PAL)  
Emmanuel Vazquez (CEDLAS; UNLP)  

Corresponding author: Mauricio Romero (mtromero@itam.mx)
AEA RCT Registry: https://www.socialscienceregistry.org/trials/4091

-------------------------------------------------------------------------------

Overview

This repository contains the code used to clean, analyze, and replicate the results presented in our paper on a large-scale school dropout prevention program in Guatemala. The code is written in Stata (.do files) and supplemented by R scripts. All analysis was conducted using Stata 18. R (4.4.2) was used to create some figures/maps. 

FOLDER STRUCTURE

Main directory:
- 00_master.do .............. Master script calling all steps in order.
- 01X_*.do .................. Raw data processing (student records, scores).
- 02X_*.do .................. Model estimation for early warning system (SAT).
- 03A_Randomization.do ...... School-level randomization with risk-based stratification.
- 04X_*.do .................. Processing of baseline and follow-up survey data.
- 05X_*.do .................. Balance checks across treatment arms.
- 06X_*.do .................. Main outcome estimation (ITT, LATE, event studies).
- 07X_*.do .................. Mechanisms and mediators.
- 08X_*.do .................. Heterogeneity analysis.
- 09_*.do ................... Figure generation (e.g., dropout over time).
- 10_*.do ................... RD specifications.
- 11–12_*.do ................ Spillover estimation.
- programs/ ................ Custom programs for table formatting.
- RCode/ .................... R scripts for maps and visualizations.

-------------------------------------------------------------------------------

KEY FILES & SUMMARIES

• 00_master.do
  - Calls all key scripts in sequence to reproduce full analysis.

• 01A_BasesRand.do
  - Cleans 2017 sixth-grade academic and dropout records.

• 01B_BasesModeloSAT.do
  - Processes 2016 fifth-grade records for prediction models.

• 01C_procesa_base.do
  - Generates analytic variables; defines dropout.

• 02A_estima_sat.do / 02B_estima_sat_2017.do
  - Predict dropout risk using fixed-effects regression.

• 03A_Randomization.do
  - Implements treatment randomization by school, using strata based on dropout risk.

• 04A–04H_*.do
  - Cleans and merges survey data:
    * 04A: Baseline
    * 04B: Reading test
    * 04C–04G: Follow-up surveys
    * 04F: Combines with enrollment data
    * 04H: Builds triple-dataset for placebo checks

• 04I_RegReady.do / 04I_TakeUpData.do / 04J–04K.do
  - Prepares final cleaned datasets for analysis and explore attrition/schooling history.

• 05A–05D_*.do
  - Balance checks across arms and alternative sample definitions.

• 06B_estima_itt_Treatments.do
  - Main ITT (intent-to-treat) estimation.

• 06C_estima_LATE_Treatments.do
  - LATE estimates using take-up/information treatment.

• 06D–06G_*.do
  - Event study graphs and year-over-year dropout analysis.

• 06H_EffectYearsSchooling.do
  - Estimates program impact on years of schooling.

• 07A–07B_*.do
  - Tests for mediation via principal/teacher behavior.

• 08A–08B_*.do
  - Heterogeneity analysis by school type, student characteristics, etc.

• 09_DroputOverTime._Figure_Catchupdo.do
  - Generates long-term dropout trajectory graphs.

• 10_RD_Estimates.do
  - RD-style sensitivity tests.

• 11_Spillovers.do / 12_SpilloversGuide.do
  - Estimates potential spillovers from treated to control schools.

• programs/TvC_Tables.do
  - Defines custom Stata routines for formatted comparison tables.

• RCode/
  - 01_PlotStats.R .......... Visualize summary statistics.
  - 02_Maps.R ............... Generate maps of treatment schools.
  - 03_NearbySchools.R ...... Analyze geographic proximity between schools.

-------------------------------------------------------------------------------

REPLICATION INSTRUCTIONS

1. Set the correct local paths in `00_master.do`.
2. Run `00_master.do` to execute the entire analysis pipeline.
3. Outputs (e.g., regression tables, plots) will be saved into defined output folders.

To replicate specific analyses, run the corresponding script (e.g., `06B_estima_itt_Treatments.do` for main results).



Data Access

We use confidential administrative data from the Ministry of Education of Guatemala (MINEDUC), including:

- Annual enrollment and grade progression (2013–2022)  
- School-level characteristics  
- Student-level demographics and performance

Due to privacy agreements with the Ministry of Education, raw data cannot be shared publicly. Researchers may request access from the Ministry of Education. 

Mock data files are not included in this package, but variable names and structures are documented throughout the .do files for replication with authorized data.


-------------------------------------------------------------------------------

SOFTWARE REQUIREMENTS

- Stata 15 or newer  
- Packages: `reghdfe`, `ivreg2`, `ftools`, `moremata`, `binsreg`, `ivreghdfe`
- R (for maps/plots), with standard packages (`ggplot2`, `sf`, etc.)

-------------------------------------------------------------------------------

Contact mtromero@itam.mx if questions arise.

-------------------------------------------------------------------------------

Citation

If using any code or structure from this archive, please cite:

Adelman, M., Haimovich, F., Romero, M., & Vazquez, E. (2025). Preventing School Dropout at Scale: Experimental Evidence from Guatemala. *Journal of Labor Economics*. Forthcoming.
